Data Mining in Incomplete Numerical and Categorical Data Sets: A Neuro Fuzzy Approach

نویسندگان

  • Pilar Rey del Castillo
  • Jesús Cardeñosa
چکیده

There are many applications dealing with incomplete data sets that take different approaches to making imputations for missing values. Most tackle the problem for numerical input variables in the data set. However, when there are two types of input variables, numerical and categorical, the state of the art has provided no clear solutions. This paper presents a proposal for handling incomplete numerical and categorical data sets using an extension of an existing neuro-fuzzy approach. The method is extensively tested in a real environment in the field of the political election polls.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Clustering Subspace for High Dimensional Categorical Data Using Neuro-Fuzzy Classification

Clustering has been used extensively as a vital tool of data mining. Data gathering has been deliberated widely, but mostly all identified usual clustering algorithms lean towards to break down in high dimensional spaces because of the essential sparsely of the data points. Present subspace clustering methods for handling high-dimensional data focus on numerical dimensions. The minimum spanning...

متن کامل

Clustering Numerical and Categorical Data

Clustering is an important technique for data mining which allows us to discover unknown relationships in our data sets. Clustering algorithms that use metrics based on the natural ordering of numbers cannot be applied to categorical (non-numerical) data. In this tutorial we will review the main methods for numerical data clustering (K-Means, Hierarchical Clustering and Fuzzy CMeans) and then s...

متن کامل

Construction of α-cut fuzzy X control charts based on standard deviation and range using fuzzy triangular numbers

Control charts are one of the most important tools in statistical process control that lead to improve quality processes and ensure required quality levels. In traditional control charts, all data should be exactly known, whereas there are many quality characteristics that cannot be expressed in numerical scale, such as characteristics for appearance, softness, and color. Fuzzy sets theory is a...

متن کامل

A Neuro Fuzzy System for Knowledge Discovery of Incomplete Construction Data

This paper tackles problems encountered in mining of incomplete data for knowledge discovery of construction databases. As historical construction data are expensive to collect, any waste of incomplete data means not only loss of knowledge but also increase of costs for knowledge discovery of construction engineering. Unfortunately, incompleteness is commonplace in the existing construction dat...

متن کامل

On a Fuzzy c-means Algorithm for Mixed Incomplete Data Using Partial Distance and Imputation

The focus of fuzzy c-means clustering method is normally used on numerical data. However, most data existing in databases are both categorical and numerical. To date, clustering methods have been developed to analyze only complete data. Although we sometimes encounter data sets that contain one or more missing feature values (incomplete data), traditional clustering methods cannot be used for s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009